NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Finer Metagenomic Reconstruction via Biodiversity Optimization

https://doi.org/10.1101/2020.01.23.916924

Foucart, Simon; Koslicki, David (October 2020, Advances in Neural Information Processing Systems 33 (NeurIPS 2020))
Larochelle, H.; Ranzato, M.; Hadsell, R.; Balcan, M.F.; Lin, H. (Ed.)
When analyzing communities of microorganisms from their sequenced DNA, an important task is taxonomic profiling: enumerating the presence and relative abundance of all organisms, or merely of all taxa, contained in the sample. This task can be tackled via compressive-sensing-based approaches, which favor communities featuring the fewest organisms among those consistent with the observed DNA data. Despite their successes, these parsimonious approaches sometimes conflict with biological realism by overlooking organism similarities. Here, we leverage a recently developed notion of biological diversity that simultaneously accounts for organism similarities and retains the optimization strategy underlying compressive-sensing-based approaches. We demonstrate that minimizing biological diversity still produces sparse taxonomic profiles and we experimentally validate superiority to existing compressive-sensing-based approaches. Despite showing that the objective function is almost never convex and often concave, generally yielding NP-hard problems, we exhibit ways of representing organism similarities for which minimizing diversity can be performed via a sequence of linear programs guaranteed to decrease diversity. Better yet, when biological similarity is quantified by k-mer co-occurrence (a popular notion in bioinformatics), minimizing diversity actually reduces to one linear program that can utilize multiple k-mer sizes to enhance performance. In proof-of-concept experiments, we verify that the latter procedure can lead to significant gains when taxonomically profiling a metagenomic sample, both in terms of reconstruction accuracy and computational performance.
more » « less
Full Text Available
The Flajolet-Martin Sketch Itself Preserves Differential Privacy: Private Counting with Minimal Space

Smith, Adam; Song, Shuang; Thakurta, Abhradeep (December 2020, Advances in Neural Information Processing Systems 33 pre-proceedings (NeurIPS 2020))
Larochelle, H.; Ranzato, M.; Hadsell, R.; Balcan, M. F.; Lin, H. (Ed.)
Full Text Available
Network Diffusions via Neural Mean-Field Dynamics

He, Shushan; Zha, Hongyuan; Ye, Xiaojing (December 2020, Advances in neural information processing systems)
Larochelle, H.; Ranzato, M.; Hadsell, R.; Balcan, M. F.; Lin, H. (Ed.)
We propose a novel learning framework based on neural mean-field dynamics for inference and estimation problems of diffusion on networks. Our new framework is derived from the Mori-Zwanzig formalism to obtain an exact evolution of the node infection probabilities, which renders a delay differential equation with memory integral approximated by learnable time convolution operators, resulting in a highly structured and interpretable RNN. Directly using cascade data, our framework can jointly learn the structure of the diffusion network and the evolution of infection probabilities, which are cornerstone to important downstream applications such as influence maximization. Connections between parameter learning and optimal control are also established. Empirical study shows that our approach is versatile and robust to variations of the underlying diffusion network models, and significantly outperform existing approaches in accuracy and efficiency on both synthetic and real-world data.
more » « less
Full Text Available
Achieving Equalized Odds by Resampling Sensitive Attributes

Romano, Yaniv; Bates, Stephen; Candes, Emmanuel. (January 2020, Advances in neural information processing systems)
Larochelle, H; Ranzato, M; null; null; null; Lin, H. (Ed.)
Full Text Available
Multiparameter Persistence Image for Topological Machine Learning.

Carrière, Mathieu; Blumberg, Andrew (January 2020, Advances in neural information processing systems)
Larochelle, H.; Ranzato, M.; Hadsell, R.; Balcan, M.F.; Lin, H. (Ed.)
In the last decade, there has been increasing interest in topological data analysis, a new methodology for using geometric structures in data for inference and learning. A central theme in the area is the idea of persistence, which in its most basic form studies how measures of shape change as a scale parameter varies. There are now a number of frameworks that support statistics and machine learning in this context. However, in many applications there are several different parameters one might wish to vary: for example, scale and density. In contrast to the one-parameter setting, techniques for applying statistics and machine learning in the setting of multiparameter persistence are not well understood due to the lack of a concise representation of the results. We introduce a new descriptor for multiparameter persistence, which we call the Multiparameter Persistence Image, that is suitable for machine learning and statistical frameworks, is robust to perturbations in the data, has finer resolution than existing descriptors based on slicing, and can be efficiently computed on data sets of realistic size. Moreover, we demonstrate its efficacy by comparing its performance to other multiparameter descriptors on several classification tasks.
more » « less
Full Text Available
Variance reduction for Random Coordinate Descent-Langevin Monte Carlo

Ding, Zhiyan; Li, Qin (January 2020, Advances in neural information processing systems)
Larochelle, H; Ranzato, M; Hadsell, R; Balcan, M.F.; Lin, H (Ed.)
Full Text Available
Minibatch Stochastic Approximate Proximal Point Methods

Asi, Hilal; Chadha, Karan; Cheng, Gary; Duchi, John C. (January 2020, Advances in neural information processing systems)
Larochelle, H; Ranzato, M; Hadsell, R; Balcan, M; Lin, H. (Ed.)
Full Text Available
Latent Dynamic Factor Analysis of High-Dimensional Neural Recordings

Bong, Heejong; Liu, Zongge; Ren, Zhao; Smith, Matthew; Ventura, Valerie; Robert, Kass E. (January 2020, Advances in neural information processing systems)
Larochelle, H.; Ranzato, M.; Hadsell, R.; Balcan, M.F.; Lin, H. (Ed.)
High-dimensional neural recordings across multiple brain regions can be used to establish functional connectivity with good spatial and temporal resolution. We designed and implemented a novel method, Latent Dynamic Factor Analysis of High-dimensional time series (LDFA-H), which combines (a) a new approach to estimating the covariance structure among high-dimensional time series (for the observed variables) and (b) a new extension of probabilistic CCA to dynamic time series (for the latent variables). Our interest is in the cross-correlations among the latent variables which, in neural recordings, may capture the flow of information from one brain region to another. Simulations show that LDFA-H outperforms existing methods in the sense that it captures target factors even when within-region correlation due to noise dominates cross-region correlation. We applied our method to local field potential (LFP) recordings from 192 electrodes in Prefrontal Cortex (PFC) and visual area V4 during a memory-guided saccade task. The results capture time-varying lead-lag dependencies between PFC and V4, and display the associated spatial distribution of the signals.
more » « less
Full Text Available
Classification with Valid and Adaptive Coverage

Romano, Yaniv; Sesia, Matteo; Candes, Emmanuel. (January 2020, Advances in Neural Information Processing Systems)
Larochelle, H; Ranzato, M; Hadsell, R; Balcan, M; Lin, H. (Ed.)
Full Text Available
The Potts-Ising model for discrete multivariate data

Razaee, Zahra; Amini, Arash A. (January 2020, NeurIPS 2020)
Larochelle, H.; Ranzato, M.; Hadsell, R.; Balcan, M.F.; Lin, H. (Ed.)
Full Text Available

« Prev Next »

Search for: All records